33 research outputs found
Comparing the hierarchy of author given tags and repository given tags in a large document archive
Folksonomies - large databases arising from collaborative tagging of items by
independent users - are becoming an increasingly important way of categorizing
information. In these systems users can tag items with free words, resulting in
a tripartite item-tag-user network. Although there are no prescribed relations
between tags, the way users think about the different categories presumably has
some built in hierarchy, in which more special concepts are descendants of some
more general categories. Several applications would benefit from the knowledge
of this hierarchy. Here we apply a recent method to check the differences and
similarities of hierarchies resulting from tags given by independent
individuals and from tags given by a centrally managed repository system. The
results from out method showed substantial differences between the lower part
of the hierarchies, and in contrast, a relatively high similarity at the top of
the hierarchies.Comment: 10 page
Extracting tag hierarchies
Tagging items with descriptive annotations or keywords is a very natural way
to compress and highlight information about the properties of the given entity.
Over the years several methods have been proposed for extracting a hierarchy
between the tags for systems with a "flat", egalitarian organization of the
tags, which is very common when the tags correspond to free words given by
numerous independent people. Here we present a complete framework for automated
tag hierarchy extraction based on tag occurrence statistics. Along with
proposing new algorithms, we are also introducing different quality measures
enabling the detailed comparison of competing approaches from different
aspects. Furthermore, we set up a synthetic, computer generated benchmark
providing a versatile tool for testing, with a couple of tunable parameters
capable of generating a wide range of test beds. Beside the computer generated
input we also use real data in our studies, including a biological example with
a pre-defined hierarchy between the tags. The encouraging similarity between
the pre-defined and reconstructed hierarchy, as well as the seemingly
meaningful hierarchies obtained for other real systems indicate that tag
hierarchy extraction is a very promising direction for further research with a
great potential for practical applications.Comment: 25 pages with 21 pages of supporting information, 25 figure
Ontologies and tag-statistics
Due to the increasing popularity of collaborative tagging systems, the
research on tagged networks, hypergraphs, ontologies, folksonomies and other
related concepts is becoming an important interdisciplinary topic with great
actuality and relevance for practical applications. In most collaborative
tagging systems the tagging by the users is completely "flat", while in some
cases they are allowed to define a shallow hierarchy for their own tags.
However, usually no overall hierarchical organisation of the tags is given, and
one of the interesting challenges of this area is to provide an algorithm
generating the ontology of the tags from the available data. In contrast, there
are also other type of tagged networks available for research, where the tags
are already organised into a directed acyclic graph (DAG), encapsulating the
"is a sub-category of" type of hierarchy between each other. In this paper we
study how this DAG affects the statistical distribution of tags on the nodes
marked by the tags in various real networks. We analyse the relation between
the tag-frequency and the position of the tag in the DAG in two large
sub-networks of the English Wikipedia and a protein-protein interaction
network. We also study the tag co-occurrence statistics by introducing a 2d
tag-distance distribution preserving both the difference in the levels and the
absolute distance in the DAG for the co-occurring pairs of tags. Our most
interesting finding is that the local relevance of tags in the DAG, (i.e.,
their rank or significance as characterised by, e.g., the length of the
branches starting from them) is much more important than their global distance
from the root. Furthermore, we also introduce a simple tagging model based on
random walks on the DAG, capable of reproducing the main statistical features
of tag co-occurrence.Comment: Submitted to New Journal of Physic
Hierarchical networks of scientific journals
Academic journals are the repositories of mankind’s gradually
accumulating
knowledge of the surrounding world. Just as knowledge is
organized into classes ranging from
major disciplines, subjects and fields, to increasingly specific
topics, journals can also be
categorized into groups using various metric. In addition, they
can be ranked according to
their overall influence. However, according to recent studies,
the impact, prestige and novelty
of journals cannot be characterized by a single parameter such
as, for example, the impact
factor. To increase understanding of journal impact, the
knowledge gap we set out to explore
in our study is the evaluation of journal relevance using
complex multi-dimensional measures.
Thus, for the first time, our objective is to organize journals
into multiple hierarchies based on
citation data. The two approaches we use are designed to address
this problem from different
perspectives. We use a measure related to the notion of m-
reaching centrality and find a
network that shows a journal’s level of influence in terms of
the direction and efficiency with
which information spreads through the network. We find we can
also obtain an alternative
network using a suitably modified nested hierarchy extraction
method applied to the
same data. In this case, in a self-organized way, the journals
become branches according to
the major scientific fields, where the local structure of the
branches reflect the hierarchy
within the given field, with usually the most prominent journal
(according to other measures)
in the field chosen by the algorithm as the local root, and more
specialized journals positioned
deeper in the branch. This can make the navigation within
different scientific fields and sub-
fields very simple, and equivalent to navigating in the
different branches of the nested
hierarchy. We expect this to be particularly helpful, for
example, when choosing the most
appropriate journal for a given manuscript. According to our
results, the two alternative
hierarchies show a somewhat different, but also consistent,
picture of the intricate relations
between scientific journals, and, as such, they also provide a
new perspective on how
scientific knowledge is organized into networks
Spectrum, Intensity and Coherence in Weighted Networks of a Financial Market
We construct a correlation matrix based financial network for a set of New
York Stock Exchange (NYSE) traded stocks with stocks corresponding to nodes and
the links between them added one after the other, according to the strength of
the correlation between the nodes. The eigenvalue spectrum of the correlation
matrix reflects the structure of the market, which also shows in the cluster
structure of the emergent network. The stronger and more compact a cluster is,
the earlier the eigenvalue representing the corresponding business sector
occurs in the spectrum. On the other hand, if groups of stocks belonging to a
given business sector are considered as a fully connected subgraph of the final
network, their intensity and coherence can be monitored as a function of time.
This approach indicates to what extent the business sector classifications are
visible in market prices, which in turn enables us to gauge the extent of
group-behaviour exhibited by stocks belonging to a given business sector.Comment: 10 pages, 3 figure
Note on the equivalence of the label propagation method of community detection and a Potts model approach
We show that the recently introduced label propagation method for detecting
communities in complex networks is equivalent to find the local minima of a
simple Potts model. Applying to empirical data, the number of such local minima
was found to be very high, much larger than the number of nodes in the graph.
The aggregation method for combining information from more local minima shows a
tendency to fragment the communities into very small pieces.Comment: 6 page
Comparing the hierarchy of keywords in on-line news portals
The tagging of on-line content with informative keywords is a widespread
phenomenon from scientific article repositories through blogs to on-line news
portals. In most of the cases, the tags on a given item are free words chosen
by the authors independently. Therefore, relations among keywords in a
collection of news items is unknown. However, in most cases the topics and
concepts described by these keywords are forming a latent hierarchy, with the
more general topics and categories at the top, and more specialised ones at the
bottom. Here we apply a recent, cooccurrence-based tag hierarchy extraction
method to sets of keywords obtained from four different on-line news portals.
The resulting hierarchies show substantial differences not just in the topics
rendered as important (being at the top of the hierarchy) or of less interest
(categorised low in the hierarchy), but also in the underlying network
structure. This reveals discrepancies between the plausible keyword association
frameworks in the studied news portals